[OV] Fix data-free VLM compression via optimum-cli #1058

nikita-savelyevv · 2024-12-10T13:40:32Z

What does this PR do?

Changes
When exporting an image-text-to-text model with optimum-cli in int4, all model components were compressed to int4. However, only language model should be compressed to int4 and other components should be compressed to int8_sym. The fix is to make VLM data-free compression run inside from_pretrained call similar to data-aware case for LMs.

Tests
Introduced additional checks for low-precision weight nodes of pipeline sub-models. This should prevent similar issues in the future.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2024-12-10T13:45:53Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

optimum/commands/export/openvino.py

tests/openvino/test_exporters_cli.py

tests/openvino/test_quantization.py

nikita-savelyevv · 2024-12-10T15:05:30Z

This fix relates to openvinotoolkit/openvino.genai#1348

AlexKoff88 · 2024-12-11T06:22:16Z

@nikita-savelyevv, thanks for the PR. Please make sure that the tests you added don't increase the overall validation time dramatically. If so, please use smaller models instead, e.g. some dummy decoder instead of opt-125m.

nikita-savelyevv · 2024-12-11T09:32:53Z

@nikita-savelyevv, thanks for the PR. Please make sure that the tests you added don't increase the overall validation time dramatically. If so, please use smaller models instead, e.g. some dummy decoder instead of opt-125m.

The *export* testing time has indeed increased by 4 minutes with this PR (31min now). But overall OV testing time is still limited by *diffusion* tests which take 33 min. I suppose in the near future we should address this, but it can be done in a separate PR.

helena-intel · 2024-12-11T10:44:55Z

@nikita-savelyevv I tested the model exported with optimum-cli built from this branch with the code from the genai README, https://github.com/openvinotoolkit/openvino.genai?tab=readme-ov-file#run-generation-using-vlmpipeline-api-in-python . I exported with just --weight-format int4, no other compression settings. I get an empty response. Same as when I pass --dataset with optimum-intel release. With --group-size 16 I have always gotten a good result before. I just reexported a model with group size 16 too, with optimum-intel 4d73e51 and it's still good.

I tested on Xeon, also tried with f32 INFERENCE_PRECISION_HINT, which did not make a difference.

nikita-savelyevv · 2024-12-11T12:44:59Z

@nikita-savelyevv I tested the model exported with optimum-cli built from this branch with the code from the genai README, https://github.com/openvinotoolkit/openvino.genai?tab=readme-ov-file#run-generation-using-vlmpipeline-api-in-python . I exported with just --weight-format int4, no other compression settings. I get an empty response. Same as when I pass --dataset with optimum-intel release. With --group-size 16 I have always gotten a good result before. I just reexported a model with group size 16 too, with optimum-intel 4d73e51 and it's still good.

I tested on Xeon, also tried with f32 INFERENCE_PRECISION_HINT, which did not make a difference.

This is interesting. When running inference via optimum-intel I don't get an empty response. But when running inference via VLMPipeline from openvino.genai I also get an empty response.

My code:

import numpy as np
import openvino as ov
import openvino_genai as ov_genai
from PIL import Image
from transformers import AutoTokenizer, AutoProcessor

from optimum.intel import OVModelForVisualCausalLM

model_path = "/home/nsavel/workspace/optimum-intel/MiniCPM-V-2_6"
image_file = "dog.jpg"
prompt = "Can you describe the image?"

# optimum-intel inference
raw_image = Image.open(image_file)
model = OVModelForVisualCausalLM.from_pretrained(model_path, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
inputs = model.preprocess_inputs(text=prompt, image=raw_image, processor=processor, tokenizer=tokenizer)
generation_kwargs = dict(max_new_tokens=100, do_sample=False)
output = model.generate(**inputs, **generation_kwargs)
print("optimum-intel:", processor.decode(output[0], skip_special_tokens=True))

# openvino.genai inference
pipe = ov_genai.VLMPipeline(model_path, "CPU")
image = Image.open(image_file)
image_data = np.array(image.getdata()).reshape(1, image.size[1], image.size[0], 3).astype(np.uint8)
image_data = ov.Tensor(image_data)
print("\nopenvino.genai:", pipe.generate(prompt, image=image_data, max_new_tokens=100))

Output:

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
optimum-intel: user
0
Can you describe the image?
assistant
Certainly The image shows a dog sitting on what appears to be a paved surface. The dog has a white and brown coat, with long fur, particularly around the ears and tail. It's wearing a green collar with a tag attached to it. The dog's mouth is open, and it seems to be panting or possibly looking up at something with its tongue out.

openvino.genai:

When model is compressed with --group_size 16 both methods produce an adequate response:

optimum-intel: user
0
Can you describe the image?
assistant
Certainly The image features a dog sitting on what appears to be a paved surface. The dog has a white and brown coat and is wearing a green collar with a tag attached. The dog's tongue is out, and it seems to be looking upwards, possibly at something or someone. There's a leash attached to the collar, suggesting that the dog might be out for a walk.

openvino.genai:  The image shows a dog sitting on the ground. The dog is wearing a green collar and a pink tag. The dog is looking up, possibly at something or someone. The background is blurred, but it appears to be an outdoor setting with some greenery. The dog's fur is brown and white, and it has long ears. The dog's posture is relaxed, and it seems to be calm and content. The image is a close-up shot of the dog, focusing on its face and upper

This is also the case when compression with --group_size 16 is run on this branch. @helena-intel could you please also try it out on your side? If so, it shows that the issue is not with this PR, but has something to do with how group size affects inference in general.

helena-intel · 2024-12-11T15:06:10Z

Thanks for the sample code @nikita-savelyevv , I re-exported the model with group size 16 with your PR and observe the same as you did. OpenVINO GenAI inference works fine with group size 16, but not without it, both with and without your PR. Tested on Xeon with nightly/dev versions of openvino-genai and nncf.
But without this PR, at least for the MiniCPM model, users got an error about the group size and would have chosen a group size that is a divisor of the channel size (in this case 16) and then everything works. And now it silently fails. So it would be great to understand this.

AlexKoff88 · 2024-12-12T15:21:39Z

@echarlaix, @IlyasMoutawwakil, PR is ready for your review.

nikita-savelyevv · 2024-12-12T15:30:33Z

Thanks for the sample code @nikita-savelyevv , I re-exported the model with group size 16 with your PR and observe the same as you did. OpenVINO GenAI inference works fine with group size 16, but not without it, both with and without your PR. Tested on Xeon with nightly/dev versions of openvino-genai and nncf. But without this PR, at least for the MiniCPM model, users got an error about the group size and would have chosen a group size that is a divisor of the channel size (in this case 16) and then everything works. And now it silently fails. So it would be great to understand this.

@helena-intel I've created ticket 159295 on OV GenAI to examine empty generation result.

nikita-savelyevv · 2024-12-16T10:13:42Z

@echarlaix @IlyasMoutawwakil could you please review this PR some time this week? I'm on a vacation starting from the next week. Thanks!

AlexKoff88 · 2024-12-18T15:31:50Z

@IlyasMoutawwakil, @echarlaix, kindly take a look at the PR as @nikita-savelyevv will be out till new year.

nikita-savelyevv added 2 commits December 10, 2024 14:30

Fix vlm compression

ee7d80f

Extend compression tests to check submodel weights precision

27358a2

Update references

f3508f4

nikita-savelyevv commented Dec 10, 2024

View reviewed changes

nikita-savelyevv marked this pull request as ready for review December 10, 2024 15:04

nikita-savelyevv requested review from AlexKoff88 and helena-intel and removed request for AlexKoff88 December 10, 2024 15:04

nikita-savelyevv requested a review from AlexKoff88 December 10, 2024 15:18

AlexKoff88 approved these changes Dec 11, 2024

View reviewed changes

nikita-savelyevv added 3 commits December 12, 2024 12:06

Fix condition

3b0851d

Export in auto dtype if possible

2f3ca06

Reformat condition

422ffe0

AlexKoff88 requested review from IlyasMoutawwakil and echarlaix December 12, 2024 15:21

nikita-savelyevv mentioned this pull request Dec 13, 2024

Fix optimum-cli command for VLM example in README openvinotoolkit/openvino.genai#1348

Merged

helena-intel approved these changes Dec 13, 2024

View reviewed changes

AlexKoff88 assigned IlyasMoutawwakil Dec 17, 2024

eaidova approved these changes Dec 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OV] Fix data-free VLM compression via optimum-cli #1058

[OV] Fix data-free VLM compression via optimum-cli #1058

nikita-savelyevv commented Dec 10, 2024

HuggingFaceDocBuilderDev commented Dec 10, 2024

nikita-savelyevv commented Dec 10, 2024

AlexKoff88 commented Dec 11, 2024

nikita-savelyevv commented Dec 11, 2024 •

edited

Loading

helena-intel commented Dec 11, 2024 •

edited

Loading

nikita-savelyevv commented Dec 11, 2024 •

edited

Loading

helena-intel commented Dec 11, 2024

AlexKoff88 commented Dec 12, 2024

nikita-savelyevv commented Dec 12, 2024

nikita-savelyevv commented Dec 16, 2024

AlexKoff88 commented Dec 18, 2024

[OV] Fix data-free VLM compression via optimum-cli #1058

Are you sure you want to change the base?

[OV] Fix data-free VLM compression via optimum-cli #1058

Conversation

nikita-savelyevv commented Dec 10, 2024

What does this PR do?

Before submitting

HuggingFaceDocBuilderDev commented Dec 10, 2024

nikita-savelyevv commented Dec 10, 2024

AlexKoff88 commented Dec 11, 2024

nikita-savelyevv commented Dec 11, 2024 • edited Loading

helena-intel commented Dec 11, 2024 • edited Loading

nikita-savelyevv commented Dec 11, 2024 • edited Loading

helena-intel commented Dec 11, 2024

AlexKoff88 commented Dec 12, 2024

nikita-savelyevv commented Dec 12, 2024

nikita-savelyevv commented Dec 16, 2024

AlexKoff88 commented Dec 18, 2024

nikita-savelyevv commented Dec 11, 2024 •

edited

Loading

helena-intel commented Dec 11, 2024 •

edited

Loading

nikita-savelyevv commented Dec 11, 2024 •

edited

Loading